A mark-specific quantile regression model

Summary Quantile regression has become a widely used tool for analysing competing risk data. However, quantile regression for competing risk data with a continuous mark is still scarce. The mark variable is an extension of cause of failure in a classical competing risk model where cause of failure is replaced by a continuous mark only observed at uncensored failure times. An example of the continuous mark variable is the genetic distance that measures dissimilarity between the infecting virus and the virus contained in the vaccine construct. In this article, we propose a novel mark-specific quantile regression model. The proposed estimation method borrows strength from data in a neighbourhood of a mark and is based on an induced smoothed estimation equation, which is very different from the existing methods for competing risk data with discrete causes. The asymptotic properties of the resulting estimators are established across mark and quantile continuums. In addition, a mark-specific quantile-type vaccine efficacy is proposed and its statistical inference procedures are developed. Simulation studies are conducted to evaluate the finite sample performances of the proposed estimation and hypothesis testing procedures. An application to the first HIV vaccine efficacy trial is provided.


Background
Quantile regression provides a comprehensive description of different parts of the conditional distribution of responses (Koenker & Bassett, 1978), and it has become a widely used c 2023 Biometrika Trust This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/ licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
tool in survival analysis.For example, Powell (1984Powell ( , 1986) ) modified the least absolute deviation procedure to analyse censored observations.Portnoy (2003) developed a recursively reweighed estimation procedure by using the principle of self-consistency for the Kaplan-Meier estimator.Peng & Huang (2008) proposed a recursive series of estimating equations for a sequence of quantiles based on the martingale feature associated with censored data.De Backer et al. (2019) suggested an adaptive method to analyse survival data through modifying the so-called check function.
Competing risk data are common in survival analysis.When the competing causes of failures are finite, Peng & Fine (2007) proposed a nonparametric quantile inference method for cause-specific failure probabilities.Peng & Fine (2009) presented a competing risk quantile regression based on the cause-specific cumulative incidence function.Sun et al. (2012) developed a generalized linear quantile regression for competing risk data when the failure type may be missing.More related works are Lee & Han (2016), Ahn & Kim (2018), Choi et al. (2018) and Farcomeni & Geraci (2020), among others.Competing risk models with continuous causes of failure, or marks, are useful in many important applications (Sun et al., 2009(Sun et al., , 2020)).Our research is motivated by a dataset from an HIV vaccine efficacy trial, in which the vaccine may only provide protection for HIV strains genetically similar to the HIV virus or viruses represented in the vaccine.The similarity between the infecting virus and the virus contained in the vaccine construct can be measured by the genetic divergence, or distance.Thus, the genetic divergence of infecting HIV viruses from the HIV strain represented in the vaccine needs to be taken into account to properly assess vaccine efficacy.
The mark variable is a measure of the genetic distance between two aligned HIV sequences, which is defined as the weighted percent mismatch of amino acids between the two HIV sequences.Since this distance may be unique for all infected subjects and the genetic diversity of HIV is extensive, it is natural to consider the mark as a continuous variable.Furthermore, during the observing period, the volunteers are potentially at risk of HIV infection from more than one mutually exclusive strain of the virus, and the mark is only observed when HIV infection occurs.If HIV infection does not occur then the mark is undefined and is not meaningful.Thus, this situation can be considered a competing risk setting, where causes of failure are replaced by a continuous mark only observed at uncensored failure times, and the mark is considered as continuous causes of failure (Sun et al., 2009).A preliminary analysis of the data is given in Fig. 1, which plots the curves of the mark-specific cumulative incidence functions separately for subjects stratified by treatment, age at the median values and behavioural risk score (Gilbert et al., 2008).Figure 1 is quite suggestive of the effects of age and behavioural risk score on the mark-specific cumulative incidence functions.
When analysing continuous mark data, the existing methods developed for discrete competing risks can no longer be applied.First, the mark is from a continuous distribution, and observations at a specified value of the mark are sparse.This feature of data is very different from that of discrete competing risk data.In addition, inspecting Fig. 1 reveals that the effects of covariates on the conditional quantiles of the failure time may vary nonlinearly with the mark.But the methods of Peng & Fine (2009), Sun et al. (2012), Ahn & Kim (2018), Choi et al. (2018) and Farcomeni & Geraci (2020) assume that the effects of covariates are constant at each given quantile level.Therefore, suitable methods are needed to analyse the varying effects of covariates with the mark.Moreover, although marginal quantile regression methods such as in Portnoy (2003) and Peng & Huang (2008) can be used to assess the vaccine efficacy, it may fail to reveal the important relation between the vaccine efficacy and infecting viruses if the mark is ignored.A toy example is given near the end of § 1.3.
Fig. 1.The estimated mark-specific cumulative incidence functions for the vaccine trial data.The mark-specific cumulative incidence function (Gilbert et al., 2008) is defined as where T is the time infected with HIV and V denotes the mark, i.e., weighted percent mismatch of amino acids.
The aim of this paper is to develop a quantile regression methodology for analysing survival data with continuous marks and assessing the HIV vaccine efficacy.The proposed method allows the covariate effects to vary nonlinearly with the mark.

Mark-specific quantile regression model
Let T be the failure time of interest and C be the censoring time.Let V denote a continuous mark variable and Z = (1, ZT ) T , where Z is a p-dimensional covariate vector.Assume that C is independent of (T, V ) given Z.Furthermore, denote by X = min(T, C) the observed time and by = I(T C) the censoring indicator, where I(•) denotes an indictor function.The mark is observed only when the corresponding failure time is uncensored.If = 0, then V is undefined and is not meaningful.The conditional mark-specific cumulative incidence function is defined as The mark-specific cumulative incidence function is an extension of the cause-specific cumulative incidence function, where the cause of the failure time is replaced by a continuous mark (Gilbert et al., 2008).Suppose that F v (t | Z) τ for some constant τ > 0, and that the support of mark V is taken to be [0, 1], rescaling V if necessary.For v ∈ [0, 1] and τ ∈ (0, τ ), we define the τ th conditional mark-specific quantile by Under the competing risk framework, mark V is only meaningfully defined when failure occurs and it cannot be treated as a covariate.The proposed conditional mark-specific quantile v), and therefore Q(τ | Z, v), is not identifiable under the competing risk setting.
The conditional mark-specific cumulative incidence function is an extension of the causespecific cumulative incidence function in a competing risk setting, where the cause of the failure time is replaced by a continuous mark only observed at the failure time (Gilbert et al., 2008).Thus, the conditional mark-specific quantile function is an extension of the cause-specific quantile function in a competing risk setting for a continuous cause of failure, and is defined analogously to that for the competing risk data with finitely many competing risks (Peng & Fine, 2009).The conditional mark-specific quantile function Q v (τ | Z) can be interpreted as the earliest time given covariate Z at which the proportion of subjects whose failures have occurred with mark V = v exceeds τ .For the HIV vaccine efficacy trials, Q v (τ | Z) can be interpreted as the first time given covariate Z that the proportion of volunteers who have infected HIV with mark V = v exceeds τ . Because , and the range of τ is not necessarily bounded by 1.This is different from the competing risk quantile regression of Peng & Fine (2009) for the discrete mark in which case 3 for more details about the choice of the upper quantile.
We propose a novel mark-specified quantile regression model to capture the nonlinear interaction effects between the covariates and the mark on the failure.Specifically, for v ∈ [0, 1] and τ ∈ (0, τ ), the model postulates that where -dimensional vector of unknown continuous functions of v and τ , and characterizes the varying effects of Z on the conditional mark-specific quantile of the failure time with respect to V .By setting the first component of Z as 1, model (1) has a nonparametric baseline function exp{β * 0τ (v)}.Model (1) has a similar form to the varying-coefficient quantile regression model (Kim, 2007) in the absence of censored data: and (Z, v), and f V (v | Z) is not identifiable under the competing risk setting, the method of Kim (2007) cannot be directly applied to estimate β * τ * (Z,v) (v) or β * τ (v) for model (1).In this paper, we develop an induced smoothing procedure (Brown & Wang, 2007) to estimate β * τ (v) under model (1), which is fast to implement using widely available numerical methods, such as the Newton-Raphson algorithm.

Quantile-type vaccine efficacy
The proposed model has applications in the sieve analysis of vaccine efficacies.To evaluate the HIV vaccine efficacy, write the covariate as Z = (1, Z 1 , Z T 2 ) T , where Z 1 is the treatment (vaccine) group indicator and Z 2 is a vector of other covariates.We define the mark-specific quantile-type vaccine efficacy as Function qve τ (v) characterizes the nonlinear dependence on v of the ratio of the conditional mark-specific quantile of the failure time at level τ under vaccine assignment (Z 1 = 1) compared to under placebo assignment (Z 1 = 0).A positive value of qve τ (v) indicates that it takes a longer time to reach the same percentage (τ ) of the mark-specific infections/diseases for the vaccine group as opposed to the placebo group.The larger the value of qve τ (v), the more effective the vaccine.It is close to zero if and only if the conditional mark-specific quantiles of the failure time have no clear differences between the vaccine and placebo groups.Here we focus on the log-linear model for , but depends on the mark value v, which simplifies the inference procedure for qve τ (v).
The mark-specified quantile regression model (1) complements the modelling approaches based on the mark-specific hazard functions (Sun et al., 2009(Sun et al., , 2020;;Han et al., 2017) by allowing covariate effects varying over τ .It also complements the marginal quantile regression models (Portnoy, 2003;Peng & Huang, 2008) by providing additional insights on how the relation between the quantile and covariates changes with the mark.To illustrate the difference with the marginal quantile regression model, we consider a toy example.Let That is, the important vaccine effects can be missed without the consideration of the mark under this case.
We also consider a cumulative version of qve τ (v), which is defined as cqve τ (v) = v a qve τ (u) du with 0 < a < 1.The quantity can be used to assess the vaccine efficacy over a range of marks for v ∈ [a, b] ⊂ (0, 1) and quantile levels τ ∈ [τ 0 , τ U ].We construct simultaneous confidence bands for cqve τ (v), and propose test statistics to evaluate the mark-specific vaccine efficacy based on the estimator of cqve τ (v) for v ∈ [a, b] ⊂ (0, 1).

Induced smoothing estimators
Suppose that we observe n independent and identically distributed copies of (X , , V , Z), denoted by In what follows, assume that the be the marked point counting process with a jump at an uncensored failure time X i and the associated mark which suggests that we can borrow strength from data in a neighbourhood of a mark.For each v ∈ (0, 1), we propose the following mark-specific localized estimating equation to estimate β * τ (v): Here L is the follow-up time satisfying is a kernel function with support on (−1, 1) and h is a bandwidth.Since U n (ξ ) is monotone, but not continuous, an exact zero crossing of U n (ξ ) may not exist.To be more specific, Fig. S1 in the Supplementary Material presents are assumed to be known.The sample size is 1500 and the bandwidth is 0.2.It can be seen that U n {β τ (v)} is very jagged and may flatten at 0.43, the value of β * 1τ (0.5), which results in numerical challenges in computing the solution of U n (ξ ), particularly with multiple covariates.In addition, since U n (ξ ) is nondifferentiable, the variance estimation of the resulting estimators can be very difficult.
To address these issues, we next propose an induced smoothing method to approximate U n (ξ ) using continuously differentiable functions (Brown & Wang, 2007).Specifically, let 1), where D F denotes the Frobenius norm of any matrix D. Similarly to Brown & Wang (2007) where W is a random vector from N(0, I p+1 ) independent of (X i , i , i V i , Z i ), and I p+1 denotes the identity matrix of size p + 1.A direct calculation shows that where (x) denotes the cumulative function of the standard normal distribution and In practice, the survival function G(t | Z) is usually unknown, but can be estimated from the observed data.For instance, if the censoring is dictated by administrative decisions or appears to be independent of the covariates, we can use the Kaplan-Meier estimator for the censoring distribution.When the censoring depends on the covariates, we can estimate G(t | Z) by specifying a semiparametric regression model for the censoring time, such as the Cox model.Here, for notational simplicity, we just consider the censoring to be independent of the covariates, but the estimation procedures are similar for the case when the censoring depends on the covariates.Let G(t) be the survival function of the censoring time; we focus on the Kaplan-Meier estimator of G(t), denoted Ĝ(t).In addition, in the numerical studies below, we take τ (v) = I p+1 for computational convenience.
Let Ŝn (b) denote the estimating equation by replacing

Asymptotic properties
We establish the uniform consistency and asymptotic normality of βτ where 0 < a < b < 1 and 0 < τ 0 < τ U < τ .We assume that the following conditions hold.
Condition 1.We have P(X L) > 0, and Z is bounded almost surely.
Condition 3. The conditional density function f (t, v | z) of (T, V ) given Z = z is twice continuously differentiable with respect to t and v, and is bounded uniformly in where λ min (•) denotes the minimum eigenvalue of a matrix, and Condition 5.The kernel function K(x) is symmetric with support [−1, 1], and has bounded variation satisfying K(u) du = 1.Condition 6.The bandwidth satisfies nh 2 → ∞ and nh 5 → 0.
Conditions 1-4 are standard assumptions for quantile regression methods in the context of survival analysis, which are analogous to those in Peng & Huang (2008) and Qian & Peng (2010).Conditions 2 and 3 are needed for the uniform consistency and the asymptotic normality of βτ (v).Condition 4 ensures the identifiability of β * τ (v).Conditions 5 and 6 are standard assumptions for kernel smoothing techniques.The uniform consistency and asymptotic normality of βτ (v) are given in the following two theorems.
In order to estimate τ (v), we need to estimate A τ (v) and D τ (v, τ ).First, by checking the proofs of Theorems 1 and 2, we can consistently estimate A τ (v) by where φ(x) is the density function of the standard normal distribution.In addition, we can consistently estimate D(v, τ ) by where Thus, τ (v) can be consistently estimated by ˆ τ (v), where Remark 1. Theorem 2 implies that the covariance matrix of βτ (v) is not affected asymptotically by Ĝ(t).One intuitive explanation is that Ĝ(t) is irrelevant to the mark in estimating βτ (v), and the former converges at a faster rate.Similar results have also been obtained by Zhang et al. (2022) for quantile regression models, where the inference on the parameters of interest is not affected asymptotically by the estimation of nuisance parameters.
However, this procedure is very time consuming for estimating β * τ (v) with respect to v and τ .
Remark 3. Our proposed method can be extended to other link functions for A more general approach is to generalize model (1) to where H(•) > 0 is a known monotone link function (Peng & Fine, 2009;Sun et al., 2012).Under this setting, the estimating equation takes the form where Ḣ(x) denotes the first derivative of H(x).Then Theorems 1 and 2 still hold by replacing A τ (v) and D(v, τ ) with A * τ (v) and D * (v, τ ), respectively.
and C (t) is the cumulative hazard function of the censoring time.The following theorem establishes the weak convergence of Bτ (v).

Tuning parameter selection
We provide some practical guidance on how to select the tuning parameters, including the bandwidth parameter used in induced smoothing, the range of quantile levels and the range of V .Bandwidth selection is often a critical part of nonparametric regression.Here we use an M-fold cross-validation method to choose the bandwidth parameter (Tian et al., 2005).Specifically, we randomly divide the data into M roughly equal-sized groups.Let D k denote the kth subgroup of data and β(−k) τ (v) be an estimate of β * τ (v) using the data from all subgroups other than D k .Under model (1), we have Thus, the kth prediction error is given by The optimal bandwidth is obtained as In practice, however, the cross-validation method may be time consuming.Alternatively, we could choose the bandwidth h = σV n −1/4 0 by using the rule-of-thumb bandwidth, where σV is the estimated standard error of the observed marks, n 0 is the number of observed failure times and > 0 is a prespecified constant (Sun et al., 2009;Han et al., 2017).Simulation results presented in § 4 show that the bandwidth h opt leads to smaller estimated standard deviation in estimation and higher power for the tests.
For the range of quantile levels, the lower and upper quantiles, τ 0 and τ U , are required to satisfy the conditions that there is no censoring below the τ 0 th quantile and τ U < min v,z max t F v (t | Z = z).In practice, τ 0 can be chosen to be close to 0 if censoring occurs at early stages (He et al., 2022).The upper quantile τ U can be chosen such that For a discrete covariate Z or a stratified version based on Z, F v (t | Z = z) can be estimated by By plotting the mark-specific cumulative incidence function estimates stratified on covariates as in Fig. 1, for example, we can choose τ U such that all estimated curves exceed it in the right tails.
For the range of V , as discussed in Sun et al. (2009), we assume that the mark variable V has a known and bounded support, and that, without loss of generality, this support is taken to be [0, 1], rescaling V if necessary (Sun et al., 2009).In practice, the range of V can be taken to be (V min , V max ) and [a, b] ⊂ (V min , V max ), where V min = min i :

Inference for vaccine efficacy
The confidence bands for the regression coefficients and the mark-specific quantile-type vaccine efficacy are provided in the Supplementary Material.Here we test the mark-specific quantile-type vaccine efficacy.If the vaccine has no efficacy then the vaccine will provide no protection against any infecting strain of the virus.As a result, the τ th mark-specific quantiles should have no significant difference between the vaccine and placebo groups for all v ∈ [a, b] and τ ∈ [τ 0 , τ U ].That is, qve τ (v) ≡ 0 for all (v, τ ) ∈ B. Thus, it is of interest to test the efficacy over a range of v and τ to assess the overall clinical/public health benefit of the vaccine.The first set of hypotheses is or H 12 : for each given τ , qve τ (v) 0 with strict inequality for at least some v.
To test H 10 , we consider the statistics and e 1 = (0, 1, 0, . . ., 0) T ∈ R p+1 .The test statistic T 11 captures general departures H 11 , while T 12 (τ ) is sensitive to the alternative H 12 , which is likely to be positive when H 12 holds.The statistics T 11 and T 12 are close to zero when qve τ (v) ≡ 0, and hence we reject H 10 if T 11 > c 11 (α) and T 12 (τ ) > c 12 (α), where c 11 (α) and c 12 (α) are the critical values.By some arguments similar to those of § 2.2, under H 10 , n 1/2 ĉqve τ (v) is asymptotically equivalent to n −1/2 n i=1 θ0i (τ , v), where θ0i (τ , v) is defined in (3).Thus, c 12 (α) can be taken as the upper α-quantile z α of the standard normal distribution.To obtain the critical value c 11 (α), we consider a resampling technique (Lin et al., 1993).Let where W i , i = 1, . . ., n, are independent standard normal variables and are independent of the observed data.According to the arguments of Lin et al. (1993), the null distribution of T 11 can be approximated by the conditional distribution of T * 11 given the observed data, which can be obtained by repeatedly generating the normal random sample W i , i = 1, . . ., n, while fixing the observed data.Thus, the critical values c 11 (α) can be taken as the (1 − α)-percentile of the conditional distributions of T * 11 .When the null H 10 is rejected, one may wish to test whether the quantile-type vaccine efficacy varies with respect to v and τ .In addition, if the vaccine is effective then the vaccine will afford protection against the infecting strain of the virus, and the vaccine efficacy will decrease as v increases (Sun et al., 2009).This leads to qve τ (v) decreasing with v for each given τ .Thus, we consider the following set of hypotheses: or H 22 : for each given τ , qve τ (v) decreases as v increases with ψ some unspecified constant.
Let a 1 and τ * 0 be two specified constants such that a < a 1 < b and τ 0 < τ * 0 < τ U .To test H 20 , we propose the test statistics and where Here, θ1i (τ , v) and θ2i (τ ) are defined as We further define When qve τ (v) is a constant function, T 21 and T 22 (τ ) are likely to be zero.
Then the null distribution of T 21 can be approximated by the conditional distribution of T *

21
given the observed data, and the critical value c 21 (α) can be taken as the (1 − α)-percentile of the conditional distribution of T * 21 .

Simulation studies
In this section, we conduct simulation studies to evaluate the finite sample performance of the proposed method.We first generate Z * = (Z * 1 , Z * 2 ) from a multivariate normal distribution with mean 0, variance 1 and correlation 0.5.Then let Z1 = I(Z * 1 > 0) and Z2 = (Z * 2 ).Under these settings, Z1 follows a Bernoulli distribution with success probability 0.5, Z2 follows a uniform distribution U(0, 1) and Z1 is related to Z2 .The conditional density of V equals 1 if μ Z1 = 0 and is 2v otherwise, where μ is a constant given below.The failure time Under the preceding settings, the underlying quantile regression model takes the form Here, we set γ 1 (v) = γ 11 + γ 12 v and γ 2 (v) = 0.5(1 + v 2 ).In the study, four models are considered: Model M1 is considered for the null hypothesis H 10 of no vaccine efficacy, and models M2-M4 are considered for the alternative hypotheses H 11 and H 12 .The departure from H 10 increases as the model moves from M2 to M4. Model M2 is also considered for the null hypothesis H 20 of constant vaccine efficacy, while M3-M4 are considered for the alternatives H For the calculations of ĉqve τ (v) and the tests studied in § 3, we take a grid of 50 evenly spaced points in [a, b] and [τ 0 , τ U ].In the test statistics T 21 and T 22 (τ ), the parameters a 1 and τ * 0 are taken as 0.35 and 0.15, respectively.The kernel function is set to be the Epanechnikov kernel function, that is, K(x) = 0.75(1 − x 2 )I(|x| < 1).The bandwidth h is chosen using the five-fold cross-validation method, and the optimal bandwidth is denoted by h opt .For a sensitivity analysis, we also consider h = 0.15 or 0.2 by using the rule-of-thumb bandwidth.The critical values are calculated using the resampling method with 1000 simulated realizations.The results presented below are based on 1000 replications with sample sizes n = 1000 and 1500.
Table 1 and the tables in the Supplementary Material report the empirical biases, the empirical standard deviations, the average of the estimated standard deviations of q ve τ (v) and ĉqve τ (v), and the coverage probabilities of the pointwise 95% confidence bands for qve τ (v) and cqve τ (v).The results suggest that the proposed estimators perform reasonably well.Specifically, the proposed estimators are nearly unbiased, the estimated standard deviations agree well with the empirical standard deviations and the coverage probabilities of the pointwise 95% confidence bands are close to the nominal level.The performance of the proposed estimators becomes better when the sample size increases from 1000 to 1500.In addition, the optimal bandwidth h opt leads to smaller estimated standard deviations for ĉqve τ (v) and q ve τ (v) than h = 0.15 and 0.2.In the Supplementary Material further figures display the estimated curves of qve τ (v) and cqve τ (v) with the pointwise and simultaneous confidence bands under models M1-M4 for h opt and n = 1000.It can be seen that the estimated curves are close to their true curves and that the confidence bands cover the entire true curves.The results for other settings are similar.In addition, in the Supplementary Material we also display the estimated curves of β * 1τ (v) and β * 2τ (v) with the pointwise confidence intervals under models M1-M4 for h opt and n = 1000.The results suggest that the estimators β1τ (v) and β2τ (v) are nearly unbiased, and that the confidence intervals also cover the entire true curves.
To investigate the performance of the test procedures, we also provide the empirical sizes and powers of the test statistics T 11 , T 12 (τ ), T 21 and T 22 (τ ) at a significance level of 0.05.The results are reported in Table 2.We see that the empirical sizes of all tests are close to the nominal 5% level, and all tests have reasonable powers to detect deviations from the null hypothesis.The powers of all tests increase as the simulation model moves from M2 to M4.In addition, the tests with the optimal bandwidth achieve higher powers than those with h = 0.15 and 0.2.Furthermore, we observe that the estimated standard deviation of ĉqve τ (v) with h = 0.2 is smaller than that with h = 0.15, which suggests that the empirical powers of the tests increase as the bandwidth varies from 0.15 to 0.2.This is because a larger bandwidth usually leads to a smaller esd of ĉqve τ (v), but the biases remain approximately the same, resulting in increased power for the larger bandwidth.Such a phenomenon was also observed by Sun et al. (2009) under the mark-specific proportional hazard model, which may be associated with the convergence rate of the normalized ĉqve τ (v) to a Gaussian process.

Real data analysis
For illustration purposes, we applied the proposed method to a dataset from the first HIV vaccine efficacy trial.The trial was carried out in North America and The Netherlands, and 5403 HIV-negative volunteers at risk were enrolled for acquiring HIV infection (Flynn et al., 2005).Volunteers were randomly assigned to receive either a recombinant glycoprotein 120 vaccine, aidsvax, or a placebo in a 2:1 ratio, and were monitored for HIV infection at semi-annual HIV testing visits for 36 months.The vaccine may only provide protection for HIV strains genetically similar to the HIV virus or viruses represented in the vaccine.The similarity between the infecting virus and the virus contained in the vaccine construct was measured by the genetic distance, which was defined as the percent mismatch of amino acids between two aligned HIV sequences.Since the HIV-gp120 region contained neutralizing epitopes that potentially induced anti-HIV antibody responses that prevented HIV infection (Wyatt et al., 1998), we defined mark V as the percent mismatch of amino acids in the whole gp120 region (581 amino acids long), where all possible mismatches of particular pairs of amino acids, e.g., A versus C, were weighted by the estimated probability of interchange; see the 2005 technical report from the University of Washington by D. C. Nickle et al.During the trial, 368 individuals were infected with HIV, but 32 individuals had missing marks.Since our proposed method cannot be directly applied to handle missing data, we removed the 32 individuals with a missing mark, and focused on the analysis of the remaining 336 samples, as in Sun et al. (2009) and Han et al. (2017).Each of the remaining 336 samples, 217 vaccine and 119 placebo, had a unique mark, and mark V ranged from 0.059 to 0.261.0.10 0.12 0.14 0.16 0.18 0.20 0.10 0.12 0.14 0.16 0.18 0.20 The solid lines are the estimated functions, the dashed lines are the pointwise 95% confidence intervals, and the dash-dotted lines are the simultaneous 95% confidence bands.
Following Sun et al. (2009) and Han et al. (2017), we considered three covariates: treatment indictor, Z 1 , taking value 1 if the volunteer was in the vaccine group and 0 otherwise; age at enrolment, Z 2 , ranging from 18-62 years with a median of 36; and behavioural risk score, Z 3 , taking values 0-7, as defined in Flynn et al. (2005).According to the preliminary analysis as in Fig. 1, we assumed that the data can be described by model (1) on [τ 0 , τ U ] = [0.05,0.15] and [a, b] = [0.1,0.2].We used the Epanechnikov kernel, and the grid points were kept the same as those in the simulation studies.We used the five-fold crossvalidation method to select the optimal bandwidth, and the optimal bandwidth h opt = 0.04, as shown in Fig. S10 of the Supplementary Material.
The estimated curves for β * 1τ (v), β * 2τ (v) and β * 3τ (v) with τ = 0.05, 0.1 and 0.15 and their pointwise confidence bands are given in the Supplementary Material.We found that age and behaviour risk score had significant effects on the risk of infection, but the vaccine did not seem to be significantly related to the risk of HIV infection.For example, age had a significant negative effect for the mark larger than 0.14 at τ = 0.05, but was nonsignificant over the whole mark interval for τ = 0.1 and 0.15.The behaviour risk score had a negative decreasing effect over the whole mark interval at τ = 0.05, while it showed a negative inverted U-shaped pattern for τ = 0.1 and 0.15.For comparison, we also analysed the data with the method of Peng & Huang (2008) that does not consider the mark, and the comparison results are given in the Supplementary Material.It can be seen that the curves estimated by Peng and Huang's method were very different from ours.In particular, Peng and Huang's method showed that the covariate effects did not change with the quantiles, for example, the estimated values for β * 1τ (v) were between 0.019 914 62 and 0.019 914 94, while our method indicated that the covariate effects varied with the quantiles.
The test statistic T 11 for H 10 versus H 11 yielded a p-value of 0.193.The p-value of T 12 (τ ) was larger than 0.798 for testing H 10 against H 12 .These results suggested that the vaccine had no significant efficacy against HIV; see Fig. 2 for the plots of q ve τ (v) and ĉqve τ (v).
In addition, we conducted the tests to evaluate whether the vaccine efficacy varied with the mark.The p-value for testing H 20 versus H 21 was 0.728 for T 21 , while the p-value for testing against H 21 was larger than 0.793 for T 22 (τ ).This indicated that the vaccine efficacy had no varying tendency on the considered mark interval.These results were consistent with those obtained by Han et al. (2017).
This fact can also be confirmed in the aforementioned Fig. S1 in the Supplementary Material, where Ŝn {β τ (v)} approximates U n {β τ (v)} well and has a unique solution.In what follows, we propose to estimate β * τ (v) by the solution to Ŝn {β τ (v)} = 0, denoted βτ (v).
21 and H 22 .The departure from H 20 increases as the model moves from M3 to M4.The censoring time C is generated from an exponential distribution with mean c, where c is chosen to give a censoring rate of about 40% under models M1-M4.Set [a, b] = [0.3,0.8] and [τ 0 , τ U ] = [0.1,0.4].

Fig. 2 .
Fig.2.Vaccine trial data analysis: The estimated curves of qve τ (v) and cqve τ (v) with τ = 0.05, 0.1 and 0.15.The solid lines are the estimated functions, the dashed lines are the pointwise 95% confidence intervals, and the dash-dotted lines are the simultaneous 95% confidence bands.
The statistic T 21 captures general departure H 21 , while T 22 (τ ) is sensitive to the monotone alternative H 22 , which is likely to be positive when H 22 holds.Hence, we reject H 20 if T 21 > c 21 (α) and T 22 (τ ) > c 22 (α), where c 21 (α) and c 22 (α) are the critical values.Under H 20 , T 22 (τ ) is asymptotically standard normal, and c 22 (α) can be taken as the upper α-quantile z α of the standard normal distribution.The critical value c 21 (α) can be obtained through the following resampling technique.Let

Table 2 .
Empirical sizes and powers of tests T 11 , T 12 (τ ), T 21 and T 22 (τ ) at the nominal level 0.05.The reported values are in percentages